智能论文笔记

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Scaling Instruction-Finetuned Language Models

Hyung Won Chung , Le Hou , Shayne Longpre , Barret Zoph , Yi Tay , William Fedus , Yunxuan Li , Xuezhi Wang , Mostafa Dehghani , Siddhartha Brahma

分类：机器学习 | 自然语言处理

2022-10-20

Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on average). Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.

translated by 谷歌翻译

Linear TreeShap

Peng Yu , Chao Xu , Albert Bifet , Jesse Read

分类：机器学习

2022-09-16

决策树由于易于解释性而闻名。为了提高准确性，我们需要种植深树或树木的合奏。这些很难解释，抵消了它们的原始好处。Shapley的价值最近已成为解释基于树的机器学习模型预测的流行方式。它为独立于树结构的特征提供了线性加权。受欢迎程度的上升主要归因于Treeshap，该treeshap解决了多项式时间中的一般指数复杂性问题。在该行业广泛采用之后，需要更有效的算法。本文提出了一种更有效，更直接的算法：线性三链。像Treeshap一样，线性三膜是精确的，需要相同数量的内存。

translated by 谷歌翻译

Don't Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning

Homer Walke , Jonathan Yang , Albert Yu , Aviral Kumar , Jedrzej Orbik , Avi Singh , Sergey Levine

分类：机器人 | 机器学习

2022-07-11

强化学习（RL）算法有望为机器人系统实现自主技能获取。但是，实际上，现实世界中的机器人RL通常需要耗时的数据收集和频繁的人类干预来重置环境。此外，当部署超出知识的设置超出其学习的设置时，使用RL学到的机器人政策通常会失败。在这项工作中，我们研究了如何通过从先前看到的任务中收集的各种离线数据集的有效利用来应对这些挑战。当面对一项新任务时，我们的系统会适应以前学习的技能，以快速学习执行新任务并将环境返回到初始状态，从而有效地执行自己的环境重置。我们的经验结果表明，将先前的数据纳入机器人增强学习中可以实现自主学习，从而大大提高了学习的样本效率，并可以更好地概括。

translated by 谷歌翻译

A Sequential Quadratic Programming Method with High Probability Complexity Bounds for Nonlinear Equality Constrained Stochastic Optimization

Albert S. Berahas , Miaolan Xie , Baoyu Zhou

分类： (统计)机器学习

2023-01-01

A step-search sequential quadratic programming method is proposed for solving nonlinear equality constrained stochastic optimization problems. It is assumed that constraint function values and derivatives are available, but only stochastic approximations of the objective function and its associated derivatives can be computed via inexact probabilistic zeroth- and first-order oracles. Under reasonable assumptions, a high-probability bound on the iteration complexity of the algorithm to approximate first-order stationarity is derived. Numerical results on standard nonlinear optimization test problems illustrate the advantages and limitations of our proposed method.

translated by 谷歌翻译

Proximal Policy Optimization with Graph Neural Networks for Optimal Power Flow

Ángela López-Cardona , Guillermo Bernárdez , Pere Barlet-Ros , Albert Cabellos-Aparicio

分类：人工智能

2022-12-23

Optimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes

translated by 谷歌翻译

Rule Learning by Modularity

Albert Nössig , Tobias Hell , Georg Moser

分类：机器学习

2022-12-23

In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with traditional methods in rule learning to provide efficient and scalable algorithms for the classification of vast data sets, while remaining explainable. Apart from evaluating our approach on the common large scale data sets MNIST, Fashion-MNIST and IMDB, we present novel results on explainable classifications of dental bills. The latter case study stems from an industrial collaboration with Allianz Private Krankenversicherungs-Aktiengesellschaft which is an insurance company offering diverse services in Germany.

translated by 谷歌翻译

RouteNet-Fermi: Network Modeling with Graph Neural Networks

Miquel Ferriol-Galmés , Jordi Paillisse , José Suárez-Varela , Krzysztof Rusek , Shihan Xiao , Xiang Shi , Xiangle Cheng , Pere Barlet-Ros , Albert Cabellos-Aparicio

分类：人工智能 | 机器学习

2022-12-22

Network models are an essential block of modern networks. For example, they are widely used in network planning and optimization. However, as networks increase in scale and complexity, some models present limitations, such as the assumption of markovian traffic in queuing theory models, or the high computational cost of network simulators. Recent advances in machine learning, such as Graph Neural Networks (GNN), are enabling a new generation of network models that are data-driven and can learn complex non-linear behaviors. In this paper, we present RouteNet-Fermi, a custom GNN model that shares the same goals as queuing theory, while being considerably more accurate in the presence of realistic traffic models. The proposed model predicts accurately the delay, jitter, and loss in networks. We have tested RouteNet-Fermi in networks of increasing size (up to 300 nodes), including samples with mixed traffic profiles -- e.g., with complex non-markovian models -- and arbitrary routing and queue scheduling configurations. Our experimental results show that RouteNet-Fermi achieves similar accuracy as computationally-expensive packet-level simulators and it is able to accurately scale to large networks. For example, the model produces delay estimates with a mean relative error of 6.24% when applied to a test dataset with 1,000 samples, including network topologies one order of magnitude larger than those seen during training.

translated by 谷歌翻译

Shakebot: A Low-cost, Open-source Shake Table for Ground Motion Seismic Studies

Zhiang Chen , Devin Keating , Yash Shethwala , Aravind Adhith Pandian Saravanakumaran , Ramon Arrowsmith , Chris Madugo , Albert Kottke , Jnaneshwar Das

分类：机器人

2022-12-21

Our earlier research built a virtual shake robot in simulation to study the dynamics of precariously balanced rocks (PBR), which are negative indicators of earthquakes in nature. The simulation studies need validation through physical experiments. For this purpose, we developed Shakebot, a low-cost (under $2,000), open-source shake table to validate simulations of PBR dynamics and facilitate other ground motion experiments. The Shakebot is a custom one-dimensional prismatic robotic system with perception and motion software developed using the Robot Operating System (ROS). We adapted affordable and high-accuracy components from 3D printers, particularly a closed-loop stepper motor for actuation and a toothed belt for transmission. The stepper motor enables the bed to reach a maximum horizontal acceleration of 11.8 m/s^2 (1.2 g), and velocity of 0.5 m/s, when loaded with a 2 kg scale-model PBR. The perception system of the Shakebot consists of an accelerometer and a high frame-rate camera. By fusing camera-based displacements with acceleration measurements, the Shakebot is able to carry out accurate bed velocity estimation. The ROS-based perception and motion software simplifies the transition of code from our previous virtual shake robot to the physical Shakebot. The reuse of the control programs ensures that the implemented ground motions are consistent for both the simulation and physical experiments, which is critical to validate our simulation experiments.

translated by 谷歌翻译

Pretraining Without Attention

Junxiong Wang , Jing Nathan Yan , Albert Gu , Alexander M. Rush

分类：自然语言处理 | 机器学习

2022-12-20

Transformers have been essential to pretraining success in NLP. Other architectures have been used, but require attention layers to match benchmark accuracy. This work explores pretraining without attention. We test recently developed routing layers based on state-space models (SSM) and model architectures based on multiplicative gating. Used together these modeling choices have a large impact on pretraining accuracy. Empirically the proposed Bidirectional Gated SSM (BiGS) replicates BERT pretraining results without attention and can be extended to long-form pretraining of 4096 tokens without approximation.

translated by 谷歌翻译